Model Selection

Multimodal document understanding

# Multimodal document understanding

Vintern 1B V2 ViTable Docvqa

A fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for Vietnamese document question answering (tabular data)

Transformers Other

H2ovl Mississippi 2b

H2OVL-Mississippi-2B is a high-performance general-purpose vision-language model developed by H2O.ai, capable of handling a wide range of multimodal tasks. This model has 2 billion parameters and performs excellently in tasks such as image captioning, visual question answering (VQA), and document understanding.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase